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Abstract 

This paper concerns randomized leader election in synchronous distributed networks. A distributed 
leader election algorithm is presented for complete n-node networks that runs in 0(1) rounds and (with 
high probability) takes only 0(y/nlog 3 ^ 2 n) messages to elect a unique leader (with high probability). 
This algorithm is then extended to solve leader election on any connected non-bipartite n-node graph 
G in 0(t(G)) time and 0(r(G) Vnlog 3/2 n) messages, where r(G) is the mixing time of a random 
walk on G. The above result implies highly efficient (sublinear running time and messages) leader 
election algorithms for networks with small mixing times, such as expanders and hypercubes. In contrast, 
previous leader election algorithms had at least linear message complexity even in complete graphs. 
Moreover, super-linear message lower bounds are known for time-efficient deterministic leader election 
algorithms. Finally, an almost-tight lower bound is presented for randomized leader election, showing 
that fl(y/ri) messages are needed for any 0(1) time leader election algorithm which succeeds with high 
probability. It is also shown that f^n 1 / 3 ) messages are needed by any leader election algorithm that 
succeeds with high probability, regardless of the number of the rounds. We view our results as a step 
towards understanding the randomized complexity of leader election in distributed networks. 

1 Introduction 

Background and motivation. Leader election is a classical and fundamental problem in distributed com- 
puting. It originated as the problem of regenerating the "token" in a local area token ring network |19] and 
has since then "starred" in major roles in problems across the spectrum, providing solutions for reliability 
by replication (or duplicate elimination), for locking, synchronization, load balancing, maintaining group 
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memberships and establishing communication primitives. As an example, the content delivery network gi- 
ant Akamai uses decentralized and distributed leader election as a subroutine to tolerate machine failure and 
build fault tolerance in its systems ll23l . In many cases, especially with the advent of large scale networks 
such as peer-to-peer systems ll27l l28l l32l . it is desirable to achieve low cost and scalable leader election, 
even though the guarantees may be probabilistic. 

Informally, the problem of distributed leader election requires a group of processors in a distributed 
network to elect a unique leader among themselves, i.e., exactly one processor must output the decision that 
it is the leader, say, by changing a special status component of its state to the value leader |[20l . All the rest 
of the nodes must change their status component to the value non-leader. These nodes need not be aware of 
the identity of the leader. This implicit variant of leader election is rather standard (cf. [20 ]), and is sufficient 
in many applications, e.g., for token generation in a token ring environment. This paper focuses on implicit 
leader election. 

In the explicit variant, all the non-leaders change their status component to the value non-leader, and 
moreover, every node must also know the identity of the unique leader. This formulation may be necessary in 
problems where nodes coordinate and communicate through a leader, e.g., implementations of Paxos IT7l[T8l. 
In this variant, there is an obvious lower bound of Q(n) messages (throughout, n denotes the number of 
nodes in the network) since every node must be informed of the leader's identity. This explicit leader 
election can be achieved by simply executing an (implicit) leader election algorithm and then broadcasting 
the leader's identity using an additional 0(n) messages and 0{D) time (where D is the diameter of the 
graph). 

The complexity of the leader election problem and algorithms for it, especially deterministic algorithms 
(guaranteed to always succeed), have been well-studied. Various algorithms and lower bounds are known 
in different models with synchronous/asynchronous communication and in networks of varying topologies 
such as a cycle, a complete graph, or some arbitrary topology (e.g., see lfT2ll20ll24ll29l[3ll and the references 
therein). The problem was first studied in context of a ring network by Le Lann [19] and discussed for 
general graphs in the influential paper of Gallager, Humblet, and Spira [8 ]. However, the class of complete 
networks has come to occupy a special position of its own and has been extensively studied (T] [lOl [131 H21 

ED. 

The study of leader election algorithms is usually concerned with both message and time complexity. 
For complete graphs, Korach, Moran and Zaks [14J and Humblet [10] presented O(relogn) message al- 
gorithms. Korach, Kutten, and Moran |[T3ll developed a general method decoupling the issue of the graph 
family from the design of the leader election algorithm, allowing the development of message efficient 
leader election algorithms for any class of graphs, given an efficient traversal algorithm for that class. When 
this method was applied to complete graphs, it yielded an improved (but still Q(n log n)) message complex- 
ity. Afek and Gafni [1J presented asynchronous and synchronous algorithms, as well as a tradeoff between 
the message and the time complexity of synchronous deterministic algorithms for complete graphs in the 
non-simultaneous wake-up model: the results varied from a 0(l)-time, 0(n 2 ) -messages algorithm to a 
0(log ra)-time, 0(n log n)-messages algorithm. Singh |[30l showed another trade-off that saved on time, 
still for algorithms with a super-linear number of messages. (Sublinear time algorithms were shown in j30l 
even for 0(n log n) messages algorithms, and even lower times for algorithms with higher messages com- 
plexities). Afek and Gafni, as well as Korach, Moran, and Zaks lfl4l[T6l showed a lower bound of Q(n log n) 
messages for deterministic algorithms in the general case. Multiple studies showed a different case where it 
was possible to reduce the number of messages to O(n) by using a sense of direction-essentially, assuming 
some kind of a virtual ring, superimposed on the complete graph, such that the order of nodes on a ring is 
known to the nodes [6 ]. The above results demonstrate that the number of messages needed for deterministic 
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leader election is at least linear if nodes wake up simultaneously, or even super-linear (i.e., f2(nlogn)) if 
nodes are woken up by the adversary. In this paper, we focus on simultaneous wake-up. 

Neverthless, in this paper we also show that our algorithms yield sublinear message complexity even in 
the case where the adversary can wake up nodes at arbitrary times, which is a significant improvement over 
the 0(n log n) bound required for deterministic algorithms. 

At its core, leader election is a symmetry breaking problem. For anonymous networks under some 
reasonable assumptions, deterministic leader election was shown to be impossible O (using symmetry 
concerns). Randomization comes to the rescue in this case; random rank assignment is often used to assign 
unique identifiers, as done herein. Randomization also allows us to beat the lower bounds for deterministic 
algorithms, albeit at the risk of a small chance of error. 

Randomized asynchronous (explicit) leader election algorithms for various networks were presented by 
Itai and Rodeh, Scheiber and Snir, and Afek and Matias |[TTl l5ll2l. In particular, one of the algorithms elects 
a leader in a complete graph with 0(n) messages and 0(log n) time [2J. The probability of error there tends 
to zero when n grows to infinity but is not given explicitly. A randomized leader election algorithm (for the 
explicit version) that could err with probability 0( ^ log ,^n(i) ) was presented recently in EoTf^l with time 
O (log ri) and linear message complexity. That paper also surveys some related papers about randomized 
algorithms in other models that use more messages for performing leader election (9] or related tasks (e.g., 
probabilistic quorum systems, Malkhi et al |2TI ). In the context of self-stabilization, a randomized algorithm 
with 0(n log n) messages and O(logra) time until stabilization was presented in [33]. 

1.1 Our Main Results 

The main focus of this paper is to study how randomization can help in improving the complexity of leader 
election, especially message complexity in synchronous networks. We first present a (implicit) leader elec- 
tion algorithm for a complete network that runs in 0(1) time and uses only 0{^/n log 3//2 n) messages to 
elect a unique leader (with high probabilitjH). This is a significant improvement over the linear number of 
messages required by any deterministic algorithm (in the simultaneous wake-up model). 

We also show that our algorithm works in the non-simultaneous wake-up model too, which is an even 
larger gap to the Q(nlogn) message complexity bound required by any deterministic algorithm. For the 
explicit variant of the problem, our algorithm can be extended to use (w.h.p.) 0(n) messages and O(l) time. 

We then extend this algorithm to solve leader election on any connected (non-bipartite jj n-node graph G 
in 0(t(G)) time and 0{T{G)^fn\og i l' 2 n) messages, where t(G) is the mixing time of arandom walk on G. 
The above result implies highly efficient (sublinear running time and messages) leader election algorithms 
for networks with small mixing time. In particular, for important graph classes such as expanders (used, e.g., 
in modeling peer-to-peer networks El), which have logarithmic mixing time, it implies an 0(log n) time and 
0{y/n\og > /' 2 n) messages algorithm, and for hypercubes, which have a mixing time of O(lognloglogn), 
it implies a sublinear 0(log n log log n) time and 0{^/n log 5//2 n log log n) messages algorithm. 

For our algorithms, we assume that the communication is synchronous and follows the standard CONGEST 
model (25], where in each round a node can send at most one message of size 0(log n) bits on a single edge. 
For our algorithm on general graphs, we also assume that the nodes have an estimate of the mixing time. 
We do not however assume that the nodes have unique IDs, hence the algorithms in this paper work also for 
anonymous networks. We assume that all nodes wake up simultaneously at the beginning of the execution. 

'in contrast, the probability of error in the current paper is 0(— <jjjy ). 
2 Throughout, "with high probability (whp)" means with probability ^ 1 — l/nP^ . 
3 Our algorithm can be modified to work for bipartite graphs as well (cf. Section 3). 
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(Additional details on our distributed computing model are given later.) 

Finally we show that, in general, it is not possible to improve over our algorithm substantially, by 
presenting an almost-tight lower bound for randomized leader election. We show that £l(y/n) messages 
are needed for any 0(1) time leader election algorithm in a complete network which succeeds with high 
probability. It is also shown that ^(ra 1 / 3 ) messages are needed by any leader election algorithm that succeeds 
with high probability, regardless of the number of the rounds. These lower bounds hold even in the LOCAL 
model ll25l . where there is no restriction on the number of bits that can be sent on each edge in each round. 
To the best of our knowledge, these are the first non-trivial lower bounds for randomized leader election. 

1.2 Technical Contributions 

The main algorithmic tool used by our randomized algorithm involves reducing the message complexity 
via random sampling. For general graphs, this sampling is implemented by performing random walks. 
Informally speaking, a small number of nodes (about O(logra)), which are the candidates for leadership, 
initiate random walks. We show that if a sufficient number of random walks are initiated (about yjnlogn), 
then there is a good probability that random walks originating from different candidates meet (or collide) at 
some node which acts as a referee. The referee notifies a winner among the colliding random walks. The 
algorithms use a birthday paradox type argument to show that a unique candidate node wins all competitions 
(i.e. is elected) with high probability. An interesting feature of that birthday paradox argument (for general 
graphs) is that it is applied to a setting with non-uniform selection probabilities. See Section 2 for a simple 
version of the algorithm that works on a complete graph. The algorithm of Section 3 is a generalization of 
the simple algorithm of Section 2 that works for any connected graph; however the algorithm and analysis 
are more involved. 

The main intuition in our lower bound proof for randomized leader election is that we show that any 
algorithm which sends less messages than required by our lower bound has a good chance of generating runs 
where there are multiple potential leader candidates in the network that do not influence each other. In other 
words, the probability of such "disjoint" parts of the network to elect a leader is the same, which implies that 
there is a good probability that more than one leader is elected. Although this is conceptually easy to state, 
it is technically challenging to show formally since our result applies to all randomized algorithms without 
further restrictions. 

1.3 Distributed Computing Model 

The model we consider is similar to the models of [I] [lOj [13] [15] [GO, with the main addition of giving 
processors access to a private unbiased coin. Also, we do not assume unique identities. We consider a system 
of n nodes, represented as an undirected (not necessarily complete) graph G = (V, E). Each node runs an 
instance of a distributed algorithm that has knowledge of n. The computation advances in synchronous 
rounds, where, in every round, nodes can send messages, receive messages that were sent in the same round 
by neighbors in G, and perform some local computation; every node has access to the outcome of unbiased 
private coin flips. The messages are the only means of communication; in particular, nodes cannot access 
the coin flips of other nodes, and do not share any memory. Throughout this paper, we assume that all nodes 
are awake initially and simultaneously start executing the algorithm, we discuss some relaxations of this 
point in a separate section. 
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Leader Election. 

We now formally define the leader election problem. Every node u has a special variable statu s u that 
it can set to a value in the set {_L, non-elected, elected}; initially we assume status,, = _L. An 
algorithm A solves leader election in T rounds if, from round T on, exactly one node has its status set to 
ELECTED while all nodes are in state NON-ELECTED. This is the requirement for standard (implicit) leader 
election. 

2 Randomized Leader Election in Complete Networks 

To provide the intuition for our general result, let us first illustrate a simpler version of our leader elec- 
tion algorithm, adapted to complete networks. More specifically, this section presents an algorithm that, 
with high probability, solves leader election in complete networks in O(l) rounds and sends no more than 
0{y/n log 3 / 2 n) messages. Let us first briefly describe the main ideas of Algorithm Q] (see pseudo-code 
below). Initially, the algorithm attempts to reduce the number of leader candidates as far as possible, while 
still guaranteeing that there is at least one candidate (with high probability). Non-candidate nodes enter the 
NON-ELECTED state immediately, and thereafter only reply to messages initiated by other nodes. Every 
node u becomes a candidate with probability 21ogn and selects a random rank r u chosen from some large 
domain. Each candidate node then randomly selects 2 \\/n log n\ other nodes as referees and informs all 
referees of its rank. The referees compute the maximum (say r w ) of all received ranks, and send a "winner" 
notification to the node w. If a candidate wins all competitions, i.e., receives "winner" notifications from all 
of its referees, it enters the ELECTED state and becomes the leader. 

Algorithm 1 Randomized Leader Election in Complete Graphs 
Round 1 : 

1 : Every node u decides to become a candidate with probability 21ogn and generates a random rank r u 
from {1, . . . , n 4 }. 

If a node does not become a candidate, it immediately enters the NON-ELECTED state; otherwise it 
executes. 

2: Choosing Referees: Node u samples 2 \y/n log n] neighbors (the referees) and sends a message (u, r u ) 
to each referee. 

Round 2 : 

3: Winner Notification: A referee v considers all received messages and sends a winner notification to 

the node w that satisfies 

r w ^ r u for every message (u, r u ). 
4: Decision: If a node receives winner notifications from all its referees, then it enters the ELECTED state, 

otherwise it sets its state to NON-ELECTED. 



Theorem 1. Consider a complete network of n nodes and assume the COMQEST model of communi- 
cation. Algorithm \J\ solves leader election with high probability, terminates in O(l) rounds, and uses 
0(^/nlog 3 / 2 n) messages with high probability. 

Proof. Since all nodes enter either the elected or non-elected state after two rounds at the latest, we 
get the runtime bound of 0(1). 
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We now argue the message complexity bound. On expectation, there are 2 log n candidate nodes. By 
using a standard Chernoff bound (cf. Theorem 4.4 in [22]), there are at most 71ogn candidate nodes with 
probability at least 1 — n~ 2 . In step 3 of the algorithm, each referee only sends messages to the candidate 
nodes by which it has been contacted. Since there are 0(log n) candidates and each approaches Q(\/n logn) 
referees, the total number of messages sent is bounded by 0{^/n log 3 / 2 n) with high probability. 

Finally, we show that Algorithm 1 solves leader election with high probability. With probability 
i ^ _ _ ogn \ ^ eX p(_21 gn) = n -2 , no node becomes candidate. Hence the probability that at least 

one node is elected as leader is at least 1 — n~ 2 . Let I be the node that generates the highest random rank 
r£ among all candidate nodes; with high probability, I is unique. Clearly, node I enters the ELECTED state, 
since it receives "winner" notifications from all its referees. 

Now consider some other candidate node v. This candidate chooses its referees randomly among all 
nodes. Therefore, the probability that an individual referee selected by v is among the referees chosen by £, 
is 2 ^ n i° gn -- It follows that the probability that I and v do not choose any common referee node is at most 



\ 2-y/nlogn 

logn \ 



^ exp (—4 log n) = n 



which means that with high probability, some node x serves as common referee to £ and v. By assumption, 
we have r v < rg, which means that node v does not receive 2 \y/n log n] "winner" notifications, and thus 
it subsequently enters the NON-ELECTED state. By taking a union bound over all other candidate nodes, it 
follows that with probability at least 1 — i no other node except i wins all of its competitions, and therefore, 
node I is the only node to become a leader. □ 

With a simple modification, our algorithm also works for non-simultaneous wakeup of nodes, though in 
that case we cannot guarantee termination in 0(1) rounds. 



2.1 Non- Simultaneous Wake-Up of Nodes 

So far, we have assumed that all nodes are up and running at the start of round 1. We now describe a simple 
extension of Algorithm Q] that preserves the low message complexity bound in a model where nodes are 
woken up at arbitrary times by the adversary (similarly to HI). The main idea is to require a referee node v 
to only send winner notifications in theirs? round r when v receives a message from some candidate nodes. 
This ensures that the candidate u that has the highest random rank among all initially awake candidate nodes 
will become leader. Let R be the set of referees chosen by the winner u. To see that there is a unique leader, 
we observe that, analogously to the proof of TheoremQ] any candidate that wakes up in some round r) 
will choose a referee among the nodes in R with high probability. Unfortunately, we can no longer guarantee 
termination within 0(1) rounds, since the adversary can simply delay waking up all but one node u, which 
has only probability 2 ° gn of becoming a candidate. 

Corollary 1. Consider a complete network ofn nodes and assume the COMQEST model of communication 
where nodes are woken up at arbitrary times by the adversary. There is an algorithm that elects a unique 
leader (w.h.p.), while using 0(^/nlog 3 ^ 2 n) messages (w.h.p.). 



6 



3 Randomized Leader Election in General Graphs 



In this section, we present our main algorithm, which elects a unique leader (w.h.p.), and terminates in 
0(r(G,n)) rounds while using messages (w.h.p.), where t(G, n) is the mixing 

time of a random walk on G. Initially, a node u only knows the mixing time (or a constant factor estimate 
of) t(G, n) (defined below in (Q])); in particular u does not have any a priori knowledge about the actual 
topology of G. 

The algorithm presented here requires nodes to perform random walks on the network by token forward- 
ing in order to choose sufficiently many referee nodes at random. Thus essentially random walks perform 
the role of sampling as done in Algorithm Q] and is conceptually similar. Whereas in the complete graph 
randomly chosen nodes act as referees, here any intermediate node (in the random walk) that sees tokens 
from two competing candidates can act as a referee and notify the winner. One slight complication we have 
to deal with in the general setting is that in the CONGEST model it is impossible to perform too many 
walks in parallel along an edge. We solve this issue by sending only the count of tokens that need to be sent 
by a particular candidate, and not the tokens themselves. 

While using random walks can be viewed as a generalization of the sampling performed in Algorithm [Q 
showing that two candidate nodes intersect in at least one referee leads to an interesting balls-into-bins 
scenario where balls (i.e., random walks) have a non-uniform probability to be placed in some bin (i.e., 
reach a referee node). This non-uniformity of the random walk distribution stems from the fact that G 
might not be a regular graph. We show that the non-uniform case does not worsen the probability of two 
candidates reaching a common referee, and hence an analysis similar to the one given for complete graphs 
goes through. 

We now introduce some basic notation for random walks. Suppose that V = {u%, . . . ,u n } and let 
di denote the degree of node i. The n x n transition matrix A of G has entries a^j = -r if there is an 
edge 6 E, otherwise ctj = 0. Entry a^j gives the probability that a random walk moves from node 
Ui to node uj. The position of a random walk after k steps is represented by a probability distribution 
7Tfc determined by A. If some node Uj starts a random walk, the initial distribution -kq of the walk is an 
n-dimensional vector having all zeros except at index i where it is 1. Once node u has chosen a random 
neighbor to forward the token, the distribution of the walk after 1 step is given by tt\ = Attq and in 
general we have iTk = A fc 7ro- If G is non-bipartite and connected, then the distribution of the walk will 
eventually converge to the stationary distribution 7r* = . . . , b n ), which has entries 6j = and 
satisfies 7r* = Air*. 

We define the mixing time t(G, n) of a graph G with n nodes as the minimum k such that, for all starting 
distributions ttq, 

\\A k TT - ir^Woo < — , (1) 

where || • denotes the usual maximum norm on a vector. Clearly, if G is a complete network, then 
t(G, n) = 1. For expander graphs it is well known that t(G, n) G 0(log n). Note that mixing time is well- 
defined only for non-bipartite graphs; however, by using a lazy random walk strategy (i.e., with probability 
1/2 stay at the current node; otherwise proceed as usual) our algorithm will work for bipartite graphs as 
well. 

First, we prove a useful lemma: 

Lemma 1. Consider p balls that are placed into n bins according to some probability distribution tt and let 
Pi be the i-th entry ofn. Let Xi be the indicator random variable that is 1 if there is a collision (of random 
walks) at referee node i. Then P [P)" =1 (Xj = 0)] is maximized for the uniform distribution. 
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Algorithm 2 Randomized Leader Election 

1: VAR origin <— 0; winner-so-far <— _L 

2: Initially, node u decides to become a candidate with probability 21 ° g " and generates a random rank r u 
from {1, . . . , n 4 }. 

Initiating Random Walks: 

3: Node u creates 2 \y/n log n | tokens of type (r u , k). 

4: Node u starts 2\\Jn log n] random walks (called competitions), each of which is represented by the 
random walk token (r u , k) (of 0(log n) bits) where r u represents u's random rank. The counter k is the 
number (initially 1) of walks that are represented by this token (explained in Line[8]). 
Disqualifying hopeless candidates (note that any node can be a referee and notify winner/loser): 

5: A node v discards every received token (r u , k) if v has received (possibly in the same round) a token r w 
with r w > r u . 

6: if a received token (r w , k') is not discarded and winner-so-far ^ r w then 

7: Node v remembers the port of an arbitrarily chosen neighbor that sent one of the (possibly merged) 
tokens containing r w in its variable origin and sets its variable winner-so-far to r w . 
Token Forwarding: 

8: Let fi = (r M , k) be a token received by v and suppose that u is not discarded in Line [5] For simplicity, 
we consider all distinct tokens that arrive in the current round containing the same value r u at v to be 
merged into a single token (r u ,k) before processing where k holds the accumulated count. Node v 
randomly samples k times from its neighbors. If a neighbor x was chosen k x ^ k times, v sends a token 

{r u ,k x ) to x. 

Notifying a Winner in round r(G, n): 

9: if winner-so-far ^ _L then 

10: Suppose that node v has not discarded some token generated by w. According to Line |5] w has 

generated the largest rank among all tokens seen by v. 
1 1 : Node v generates a winner notification (WIN , r w , cnt) for r w and sends it to the neighbor stored in 

origin (cf. Line [7]). The field cnt is set to 1 by v and contains the number of winner notifications 

represented by this token. 

12: If a node u receives (possibly) multiple winner notifications for r w , it simply forwards a token 
(WIN, r w ,cnt') to the neighbor stored in origin where cnt' is the accumulated count of all received 
tokens. 
Decision: 

13: If a node wins all competitions, i.e., receives 2\yJn\ogn\ winner notifications it enters the ELECTED 
state, otherwise it sets its state to NON-ELECTED. 



Proof. By definition, we have 

v[x t = i] = {i-{i- Pl yf. 

Note that the events = 1 and X, = 1 are not necessarily independent. A common technique to treat 
dependencies in balls-into-bins scenarios is the Poisson approximation where we consider the number of 
balls in each bin to be independent Poisson random variables with mean p/n. This means we can apply 
Corollary 5.11 of [22], which states that if some event E occurs with probability p in the Poisson case, 
it occurs with probability at most 2p in the exact case, i.e., we only lose a constant factor by using the 
Poisson approximation. A precondition for applying Corollary 5.11, is that the probability for event E 
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monotonically decreases (or increases) in the number of balls, which is clearly the case when counting the 
number of collisions of balls. Considering the Poisson case, we get 



=np[^=o]=n( i -( i -( i -^) p ) 2 )- 

i=l t=l 
n n 



i=i i=i 




To maximize P [P|" =1 (Xj = 0)], it is thus sufficient to minimize Yli=i Pi under the constraint Ya=i Pi = ^- 
Using Lagrangian optimization it follows that this is minimized for the uniform distribution. □ 

Theorem 2. Consider a non-bipartite network G of n nodes with mixing time t(G, n), and assume the 
COMQEST model of communication. Algorithm\2\solves leader election with high probability, terminates 
within 0(t(G, n)) rounds, and uses 0(r(G,n)^log 3/2 n) messages with high probability. 

Proof. We first argue the message complexity bound. On expectation, there are 0(logn) candidate nodes. 
By using a standard Chernoff bound (cf. Theorem 4.4 in [22]), there are at most 71ogn candidate nodes 
with probability at least 1 — n~ 2 . Every candidate node u contacts @(^/n logn) referee nodes and initiates 
a random walk of length t(G, n), for each of the @(^n logn) referees. By the description of the algorithm, 
each referee node only sends messages to the candidate nodes by which it has been contacted. Since we 
have O(logn) candidates, the total number of messages sent is bounded by 0(t(G, n)-y/n log 3 / 2 n) with 
high probability. 

The running time bound depends on the time that it takes to complete the 2 \y/n log n] random walks in 
parallel and the notification of the winner. By Line [5j it follows that a node only forwards at most one token 
to any neighbor in a round, thus there is no delay due to congestion. Moreover, for notifying the winner, 
nodes forward the winner notification for winner w to the neighbor stored in origin. According to Line|7J 
a node sets origin to a neighbor from which it has received the first token originated from w. Thus there 
can be no loops when forwarding the winner notifications, which reach the winner w in at most T(G,n) 
rounds. 

We now argue that Algorithm |2] solves leader election with high probability. Similarly to AlgorifhmQ] it 
follows that there will be at least one leader with high probability. Let £ be the candidate that generated the 
(unique) highest random rank among all candidates and consider some other candidate node v, i.e., we have 
that r v < r£ by assumption. By the description of the algorithm, node v chooses its referees by performing 
p = 2\y/n\og n] random walks of length r(G, n). We cannot argue the same way as in the proof of 
Algorithm [TJ since in general, the stationary distribution of G might not be the uniform distribution vector 
(=;>■■■} r). Let pi be the i-th entry of the stationary distribution. Let X, be the indicator random variable 
that is 1 if there is a collision (of random walks) at referee node i. We have P [Xi = 1] = (1 — (1 — pi) p ) 2 ■ 
We want to show that the probability of error (i.e., having no collisions) is small; in other words, we want to 
upper bound P [HILi i-^-i = 0)]- Lemma[T]shows that it is sufficient to obtain a bound for the case when the 
stationary distribution is uniform. By £[]), the probability of such a walk hitting any of the referees chosen 
by £, is at least n ■ It follows that the probability that I and v do not choose a common referee node is 
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at most 

\ log n 

log n \ 



n 



^ exp (— 2 log n) . (2) 



Therefore, the event that node v does not receive sufficiently many winner notifications, happens with prob- 
ability ^ 1 — n~ 2 , which requires v to enter the NON-ELECTED state. By taking a union bound over all other 
candidate nodes, it follows that with high probability no other node except I will win all of its competitions, 
and therefore, node i is the only node to become a leader with probability at least 1 — ~. □ 



4 Lower Bound 

In this section we prove a lower bound on the number of messages required by any algorithm that solves 
leader election with probability at least 1 — 1/ra. 

Our model assumes that all processors execute the exact same algorithm and have access to an unbiased 
private coin. So far we have assumed that nodes are not equipped with unique ids. Nevertheless, our lower 
bound still holds even if the nodes start with unique ids. 

Our lower bound applies to all algorithms that send only o(yjn) messages with probability at least 
1 — 1/n. In other words, the result still holds for algorithms that have small but nonzero probability for 
producing runs where the number of messages sent is much larger (i.e., Q(->/n)). We show the result for the 
LOCAL model, which implies the same for the CONGEST model. 

Theorem 3. Consider any algorithm A that uses f(n) messages (of arbitrary size) with high probability 
on a complete network of n nodes. If A solves leader election in 0(1) rounds with high probability, then 
f(n) G il(y / n). Moreover, f(n) G ^(n 1 / 3 ) for any algorithm A using any number of rounds that solves 
leader election with high probability. This holds even if nodes are equipped with unique identifiers (chosen 
by the adversary). 

Proof. We first show the result for the case where nodes are anonymous, i.e., are not equipped with unique 
identifiers, and later on extend the impossibility to the non-anonymous case by an easy reduction. 

Assume that there is some algorithm A that solves leader election with high probability but sends only 
f(n) messages. The remainder of the proof involves showing that this yields a contradiction. Consider a 
complete network where for every node, the adversary chooses the connections of its ports as a random 
permutation on {1, . . . , n — 1}. 

For a given run a of an algorithm, define the communication graph C r (a) to be a directed graph on 
the given set of n nodes where there is an edge from u to v if and only if u sends a message to v in some 
round r' ^ r of the run a. For any node u, denote the state of u in round r of the run a by a r (u, a). Let 

5 be the set of all node states possible in algorithm A. (When a is known, we may simply write C r and 
a r (u).) With each node u G C r , associate its state <j r (u) in C r , the communication graph of round r. We 
say that node u influences node w by round r if there is a directed path from u to w in C. (Our notion 
of influence is more general than the causality based "happens-before" relation of ifTTl . since a directed 
path from u to w is necessary but not sufficient for w to be causally influenced by u.) A node u is an 
initiator if it is not influenced before sending its first message. Note that a mute node that never receives 
any messages is also an initiator. For every initiator u, we define the influence cloud XC r u as the pair 
TC r u = (C£, SQ, where C£ = (u, w±, . . . , Wk) is the ordered set of all nodes that are influenced by u, 
namely, that are reachable along a directed path in C r from it, ordered by the time by which they joined 
the cloud, and = (cr r (u, a), a r (wi,a), . . . , a r {wk,a)) is their configuration after round r, namely, their 
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current tuple of states. (In what follows, we sometimes abuse notation by referring to the ordered node set 
C r u as the influence cloud of u.) Note that a passive (non-initiator) node v does not send any messages before 
receiving the first message from some other node. 

Since we are only interested in algorithms that send a finite number of messages, in every execution a 
there is some round p = p{a) by which no more messages are sent. 

In general, it is possible that in a given execution, two influence clouds C£ and C£ 2 intersect each other 
over some common node v, if v happens to be influenced by both u\ and U2- The following lemma shows 
that the low message complexity of algorithm A yields a good probability for all influence clouds to be 
disjoint from each other. 

Hereafter, we fix a run a of algorithm A. Let Ni be the event that there is no intersection between (the 
node sets of) the influence clouds existing at the end of round i, i.e., C' l u n C % , = for every two initiators 
u, v! . Let N r = A[=i fy. Let N = N p be the corresponding event at the end of the run a. Let M be the 
event that algorithm A sends no more than f(n) messages in the run a. 

Lemma 2. Assume that P [M] 1 — — . Then either of the following two conditions is sufficient to ensure 
thatP[N AM])1- o(l): 

(a) f(n) G o(y / n) and A terminates in O(l) rounds, or 

(b) f(n) G o(n 1 / 3 ) and A terminates (in an arbitrary number of rounds). 

Proof Under the assumption that P [M] ^ 1 - ±, and since P [N A M] = P [N \ M] P [M], it suffices to 
show that P [N \ M] ^ 1 — o(l) under the assumptions (a) or (b). 

We first show the claim assuming (a). To prove the claim, we show by induction on r that P [N r A M] ^ 
1 — o(l) for every ^ r ^ p. For r = the claim is immediate. Now assume the claim for rounds up to 
r — 1 and consider round r. Consider some cloud C r and any node v G C r . Conditioning on M , there are 
at most f(n) nodes in all other clouds except C r . Recall that the port numbering of every node was chosen 
uniformly at random and, since we assume M, any node knows the destinations of at most f(n) of its ports 
in any round. To send a message to a node in another cloud, v must hit upon one of the (at most f(n)) ports 
leading to other clouds, from among its (at least n — f(n)) yet unexposed ports. Therefore, the probability 

, which implies that 



that a message sent by v reaches a node in another cloud is at most 



7H' 



N r I iV r _i a M 



> 1 



> i 



n - f(n) 
2/ 2 (n) 



fin) 



^ exp 
l-o(l). 



2/ 2 (n) 
n- f(n) 



n - f(n) 

The probability that there are no intersections up to round r satisfies 



(3) 



P[JV r |M] = P 



N r A iV r _i | M 



N r | 7V r _i A M 



P [7V r _i A M] = 1 — o(l) 



where the last equality follows by Eq. Q and the inductive hypothesis. This completes the inductive proof 
and establishes that P [N A M] = P [N p A M] = 1 - o(l). 

Next, we prove the claim assuming (b), i.e., f(n) G o(n 1//3 ). Let D v be the event that node v sends 
a message to some other cloud on some round of the run a. By the same argument as in case (a), the 
probability that a message sent by v reaches a node in another cloud is at most J_}™}^ - Moreover, v sends 

at most f(n) messages. Therefore, P [D v \ M] ^ JLJ^l 



satisfies N = \/f=i^ D v ., and, considering that at most /(n) nodes can send a message in total, we have that 



n-f{n) - 

■ _ ,. \ Note that N, the complementary event to N, 
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P [N | M] V-/(n) = n-f{n) • Clearl y> this probability is o(l), since by assumption f(n) G o(n 1 / 3 ), 
thus we get P [iV A M] = 1 — o(l). □ 

We next consider potential cloud configurations, namely, Z = ( <7o, 01, ... , crfc), where <7j G £ for 
every i, and more generally, potential cloud configuration sequences Z r = (Z 1 , . . . , Z r ), where each Z % 
is a potential cloud configuration, which may potentially occur as the configuration tuple of some influence 
clouds in round i of some execution of Algorithm A (in particular, the lengths of the cloud configurations 
Z % are monotonely non-decreasing). We study the occurrence probability of potential cloud configuration 
sequences. 

We say that the potential cloud configuration Z = (ao, o\ , . . . , o~k) is realized by the initiator u in round 
r of execution a if the influence cloud XC U = (C£, <S£) has the same node states in as those of Z, or 
more formally, = (a r (u, a),a r (w\,a), . . . , a r (wk, a)}, such that a r (u, a) = ao and a r (wi, a) = a, L 
for every % G [l..k]. In this case, the influence cloud ZC r u is referred to as a realization of the potential cloud 
configuration Z. (Note that a potential cloud configuration may have many different realizations.) 

More generally, we say that the potential cloud configuration sequence Z r = (Z 1 , . . . , Z r ) is realized 
by the initiator u in execution a if for every round i = 1, . . . , r, the influence cloud XC L U is a realization 
of the potential cloud configuration Z l . In this case, the sequence of influence clouds of u up to round 
r, 1C T U = (TC\, . . . ,TC T U ), is referred to as a realization of Z r . (Again, a potential cloud configuration 
sequence may have many different realizations.) 

For a potential cloud configuration Z, let E r u (Z) be the event that Z is realized by the initiator u in 
(round r of) the run of algorithm A. For a potential cloud configuration sequence Z r , let E u (Z r ) denote the 
event that Z r is realized by the initiator u in (the first r rounds of) the run of algorithm A. 

Lemma 3. Restrict attention to executions of algorithm A that satisfy event N, namely, in which all stable 
influence clouds are disjoint. Then P [E^Z 7 ")] = P [i?„(Z r )] for every r G every potential cloud 

configuration sequence Z r , and every two initiators u and v. 

Proof. The proof is by induction on r. Initially, in round 1, all possible influence clouds of algorithm A 
are singletons, i.e., their node sets contain just the initiator. Neither u nor v have received any messages 
from other nodes. This means that P [(?i{u) = s] = P [o~\{v) = s] for all s G S, thus any potential cloud 
configuration Z 1 = (s) has the same probability of occuring for any initiator, implying the claim. 

Assuming that the result holds for round r — 1 ^ 1, we show that it still holds for round r. Consider a 
potential cloud configuration sequence Z r = (Z 1 , . . . , Z r ) and two initiators u and v. We need to show that 
Z r is equally likely to be realized by u and v, conditioned on the event N. By the inductive hypothesis, the 
prefix Z r_1 = (Z 1 , . . . , Z r ~ r ) satisfies the claim. Hence it suffices to prove the following. Let p u be the 
probability of the event E r u (Z r ) conditioned on the event N f\E u {Z r ~ 1 ). Define the probability p v similarly 
for v. Then it remains to prove that p u = p v . 

To do that we need to show, for any state Oj G Z r , that the probability that w u j, the jth node in TC U , is 
in state aj, conditioned on the event N A E u {Z r ^ 1 ), is the same as the probability that w v j, the jth node in 
TC V , is in state aj, conditioned on the event N A E v (Z r ~ 1 ). 

There are two cases to be considered. The first is that the potential influence cloud Z r ~ l has j or more 
states. Then by our assumption that events E u (Z r ~ x } and E v (Z r " 1 ) hold, the nodes w u j and w v j were 
already in u's and v's influence clouds, respectively, at the end of round r — 1. The node w u j changes its 
state from its previous state, a 1 - , to aj on round r as the result of receiving some messages Mi , . . . , Mf> from 
neighbors xf, . . . , x" in n's influence cloud ZC'^ 1 , respectively. In turn, node sends message Mj to w u j 
on round r as the result of being in a certain state a r (x 1 -) at the beginning of round r (or equivalently, on 
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the end of round r — 1) and making a certain random choice (with a certain probability qj for sending Mj 
to w u j). But if one assumes that the event E v (Z r ^ 1 ) holds, namely, that Z r ~ l is realized by the initiator v, 
then the corresponding nodes xj, . . . ,x\'m <y's influence cloud TC 1 ^ 1 will be in the same respective states 
(cr r (xj) = a r (x"j) for every j) on the end of round r — 1, and therefore will send the messages Mi, . . . , Mi 
to the node w v j with the same probabilities qj. Also, on the end of round r — 1, the node w v j is in the same 
state <jj as w u j (assuming event E v (Z r ~ 1 )). It follows that the node w v j changes its state to Oj on round r 
with the same probability as the node w u j. 

The second case to be considered is when the potential influence cloud Z r has fewer than j states. This 
means (conditioned on the events E u (Z r ~ 1 ) and E v (Z r ~ 1 ) respectively) that the nodes w M j and w v j were 
not in the respective influence clouds on the end of round r — 1. Rather, they were both passive nodes. By an 
argument similar to that made for round 1 , any pair of (so far) passive nodes have equal probability of being 
in any state. Hence P [a r -\[w U j) = s] = P [a r -i(w v j) = s] for all s G S. As in the former case, the node 
w u j changes its state from its previous state, a'- , to cx, on round r as the result of receiving some messages 
Mi, . . . , Mi from neighbors xf, . . . ,xj that are already in it's influence cloud Z"C^ _1 , respectively. By a 
similar analysis, it follows that the node w v j changes its state to <jj on round r with the same probability as 
the node w u j. □ 

We now conclude that for every potential cloud configuration Z, every execution a and every two 
initiators u and v, the events Eu(Z) and E£(Z) are equally likely. More specifically, we say that the potential 
cloud configuration Z is equi-probable for initiators u and v if P [E&(Z) | N] = P [E%(Z) \ N]. Although 
a potential cloud configuration Z may be the end-colud of many different potential cloud configuration 
sequences, and each such potential cloud configuration sequence may have many different realizations, the 
above lemma implies the following (integrating over all possible choices). 

Corollary 2. Restrict attention to executions of algorithm A that satisfy event N, namely, in which all (final) 
stable influence clouds are disjoint. Consider two initiators u and v and a potential cloud configuration Z. 
Then Z is equi-probable for u and v. 

By assumption, algorithm A errs with probability p en ^ 1/n. Let S be the event that A elects exactly 
one leader. We get 

P [S | M A N] ^ P [M A N] - Perr = 1 - o(l). (4) 

Conditioning on event M A N, let X be the random variable that represents the number of disjoint influence 
clouds generated by algorithm A. By Cor. [H each of the initiators has the same probability p of generating 
a leader cloud. Algorithm A succeeds whenever event S occurs. Its success probability assuming X = c is 

P [S | M A N A (X = c)] = cp(l - pf- 1 . (5) 

For any given c, the value of © is maximized if p = ^, which yields that P [S \ M A A (X = c)] < 1/e 
for any c. It follows that P [S \ M A N] ^ 1/e as well. This, however, is a contradiction to (0]) and 
completes the proof of Theorem [3] for algorithms without unique identifiers. 

We now briefly argue why our result holds for any algorithm B that runs in a model where nodes are 
equipped with unique ids (chosen by the adversary). Suppose that, w.h.p., B succeeds in electing a leader 
while sending only f(n) messages. Now consider an algorithm B' in our model that is identical to B with 
the difference that before performing any other computation, every node generates a random number from 
the range [1, . . . , n 4 ] and uses this value instead of the unique id. Let I be the event that all node ids are 
distinct; clearly / happens with high probability. Therefore, by the success probability of B, it follows that 
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B' also succeeds with probability 1 — o(l) (conditioned on I), which contradicts our result for algorithms 
without unique ids. This completes the proof of Theorem [3] □ 



5 Conclusion 

We studied the role played by randomization in distributed leader election. Some open questions on random- 
ized leader election are raised by our work: (1) Can we find (universal) upper and lower bounds for general 
graphs? (2) Is Cl(y/n) a lower bound on the number messages needed for a complete graph, regardless of 
the number of rounds? 
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